Self-organising Map Techniques for Graph Data Applications to Clustering of XML Documents
نویسندگان
چکیده
Neural networks have been one of the main techniques used widely in data mining. There are a number of popular neural network architectures, e.g. multilayer perceptrons, self organising maps, support vector machines. However, most of these techniques have been applied to problems in which the inputs are vectors. In other words, the inputs to these neural network architectures are expressed in the form of vectors, often in fixed dimensions. In case the inputs are not suitably expressed in the form of vectors, they are made to conform to the fixed dimension vectorial format. For example, it is known that an image may be more conveniently expressed in the form of a graph, for instance, the image of a house can be expressed as a tree, with the source node (level 0) being the house, windows, walls, and doors expressed as leaves (level 1), and details of windows, walls and doors being expressed as leaves (level 2) of those leaves located in level 1, etc. These nodes are described by attributes (features, which may express colour, texture, dimensions) and their relationships with one another are described by links. Such inputs can be made to conform to a vectorial format if we “flatten" the structure and instead represent the information in each node in the form of a vector, and obtain the aggregate vector by concatenating the vectors together. Such techniques have been prevalent in the application of neural network architectures to these problems.
منابع مشابه
خوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملClustering XML Documents Using Self-organizing Maps for Structures
Self-Organizing Maps capable of encoding structured information will be used for the clustering of XML documents. Documents formatted in XML are appropriately represented as graph data structures. It will be shown that the Self-Organizing Maps can be trained in an unsupervised fashion to group XML structured data into clusters, and that this task is scaled in linear time with increasing size of...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملAutomating XML Markup using Machine Learning Techniques
In this paper we present a novel system for automatically marking up text documents into XML. The system uses the techniques of the Self-Organising Map (SOM) algorithm in conjunction with an inductive learning algorithm, C5.0. The SOM algorithm clusters the XML marked-up documents on a two-dimensional map such that documents having similar content are placed close to each other. The C5.0 algori...
متن کاملA novel self-organising clustering model for time-event documents
Purpose Neural document clustering techniques, e.g., self-organising map (SOM) or growing neural gas (GNG), usually assume that textual information is stationary on the quantity. However, the quantity of text is ever-increasing. We propose a novel dynamic adaptive self-organising hybrid (DASH) model, which adapts to time-event news collections not only to the neural topological structure but al...
متن کامل